# CLIP Architecture Optimization
Vit Giant Patch14 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained on the laion2B dataset
Image Classification
Transformers

V
timm
71
0
Convnext Large Mlp.clip Laion2b Ft Soup 320
Apache-2.0
ConvNeXt-Large image encoder based on CLIP architecture, fine-tuned on the LAION-2B dataset, supporting 320x320 resolution image feature extraction
Image Classification
Transformers

C
timm
173
0
Quiltnet B 16 PMB
MIT
A multimodal foundation model based on ViT-B/16 visual encoder and PubMedBERT text encoder trained on the Quilt-1M pathology video dataset
Image-to-Text
Q
wisdomik
513
5
Quiltnet B 32
MIT
A CLIP ViT-B/32 vision-language foundation model trained on the Quilt-1M pathology video dataset, specifically designed for histological analysis
Text-to-Image
Q
wisdomik
8,442
22
Altclip M9
Openrail
AltCLIP-m9 is a multilingual CLIP model supporting 9 languages, providing support for multilingual text-to-image models.
Text-to-Image
Transformers Supports Multiple Languages

A
BAAI
25
8
Featured Recommended AI Models